11/13/2021

Introduction

Increasing the sales and profit margins is one of the most crucial priority of business owners. Thus, an owner of a local cafe asked me do some analysis on three different data to increase the number of customers which will increase profits.

Research questions

How could a cafe increases it’s profits?

To answer this question or to discover new information, or confirm an idea they already know, I plan to:

  • Discover what are the days that have many customers using time series on visitors data.
  • Discover what are the best seller items using the products data.
  • Looking for trends to figure out what are the hours, days, and months that have many costumers.

The source and structure of the data

The visit data is from a device the owner of the Cafe put above the main door to count the visitors and have some extra variables.It has 6 variables and 8089 observations.

Variable Description
day The date
Time The hour of the date
ValueIn The number of visitors
ValueOut The number of visitors leaving the cafe
Turn In Rate(%) The rate of visitors turn in
OutsideTraffic The number of people outside the cafe

Items Data

The items data from the sales website and it has 5 variables and 96 observations.

Variable Description
item The name of the item
count How many pieces have been sold
price The overall price
cost The cost of the items
profits The amount of money earned

Sales Data

  • The sales data is from the sales website that he is using directly and has the 5 variables and 338 observations.
Variable Description
Total sales The total number of sales
Items cost The cost of the sold items
Taxes Additional fee
Offers Discount or some offers
Profits The amount of money earned

Assumptions

  1. In visit data, the variables are positively skewed due to the large number of zeros and ones since it is hourly data.

  2. It is not enough data since the cafe is open for less that a year.

EDA

  • After uploading the data and exploring the distributions of the variables, I decided to remove the variable Value out since it doesn’t add any value to the number of visitors.

-A large number of zeros is because of the hourly data.

Important features using Decision Trees

  • The time, and Date are the most important variables.

The effect of the hour visitors count

  • We can see that there are many customers at night from 7pm until 12pm, but the highest number of visitors were at 22pm and 23pm.

The effect of the day on visitors count

  • weekends have the highest number of visitors.

Discovering what are the best seller item using the products data.

## [1] "Flat white"
## [1] 64124.9

Flat White is the best seller and most profitable item!

Items with more than 2000 Pieces sold in 11 months

## [1] "Flat white"                  "Aqua Carpathica Water 330ml"
## [3] "Hot latte"                   "Cappuccino"

The best selling items are drinks. Looks like we drink hot drinks even in the very hot weather since most of them are hot drinks. (At least water is on the list)

Which Items are not profitable?

## [1] "hot coconut latte"
  • Maybe it is a good idea to think about replacing or removing this drink.

Trend

Sales Data

  • All the variables are positively skewed. A large number of zeros duo to the closing hours.

Trend

  • Repeating pattern over time.

Daily Trend

Seasonality

  • It is a repeating pattern within a year. We can not see that in our data because we don’t have a record for a whole year.

There is no Seasonality in my data.

Challenges

  • Doing a time series model for a short time is not easy and with some challenges such as covid-19 lockdown.
  • In the time series, there is not enough data for a whole year, which prevent me from looking to the seasonality. I tried different ways to check it but nothing works.

Conclusion

To increase the number of visitors and thus profits, I suggest focusing on weekends as they tend to increase the number of people. Also, removing unprofitable products from then and increasing the number of profitable products may help.

Outlook for future development

Deploying the model is one of what I plan to do in the future to feed it with the new data after completing one year from the opening date.

Limitations & problems

The problem that I faced is that as I said, there is no seasonality in the data. I think that due to covid-19 lockdown and the limited number of customers that are allowed to enter the cafe. The Limitation was in the data since it is for a period that is less than a year.

Thank you for listining

  • Any question?